Any SQL Server can play up to three distinct roles in a data replication environment:
Publication server—
The publication server (or publisher) contains the database or
databases that will be published (the magazine!). This is the source of
the data that is to be replicated to other servers. In Figure 1, the Customer table (an article in the magazine) in the AdventureWorks2008
database is the data to be published. To publish data, the database
that contains the data that will be published must first be enabled for
publishing.
Distribution server— The
distribution server (or distributor) can either be on the same server
as the publication server or on a different server (in which case it is a
remote distribution server). This server contains the distribution
database. This database, also called the store-and-forward database,
holds all the data changes that are to be forwarded from the published
database to any subscription servers that subscribe to the data. A
single distribution server can support several publication servers. The
distribution server is truly the workhorse of data replication; it is
essentially the mail system that picks up the magazine and delivers it
to the subscription holder.
Subscription server—
The subscription server (or subscriber) contains a copy of the database
or portions of the database being published (for example, the Customer table in the AdventureWorks2008
database). The distribution server sends any changes made to this table
(in the published database) to the subscription server’s copy of the Customer table. This is known as store-and-forward.
Some data replication configurations send the data to the subscription
server, and then the data is read-only. It is also possible for
subscribers (known as updating subscribers) to make updates, which are
sent back to the publisher. More on this in the Updating Subscribers
Replication Model section.
There are now new variations of this update subscriber option called peer-to-peer replication.
Peer-to-peer allows for more than one publisher of the same data
(table) at the same time! Essentially, each publisher is also a
subscriber at the same time (hence, peer-to-peer).
Along with enabling
distinct server roles (publisher, distributor, and subscriber),
Microsoft utilizes a few more magazine metaphors, including publications
and articles. A publication is a group of one or more articles and is the basic unit of data replication. An article
is simply a pointer to a single table, or a subset of rows or columns
out of a table, that will be made available for replication.
Publications and Articles
A single database can contain more than one publication. You can publish
data from tables, from database objects, from the execution of stored
procedures, and even from schema objects, such as referential integrity
constraints, clustered indexes, nonclustered indexes, user triggers,
extended properties, and collation. Regardless of what you plan to
replicate, all articles in a publication are synchronized at the same
time. Figure 2 shows an example of a publication (named Cust_Orders publication) with three articles (three tables from the AdventureWorks2008 database). You can also choose to replicate whole tables or just parts of tables via filtering.
Filtering Articles
You can create articles
within a publication in several different ways. The basic way to create
an article is to publish all the columns and rows contained in a table.
Although this is the easiest way to create articles, your business
needs might require that you publish only specific columns or certain
rows of a table. This is referred to as filtering vertically or horizontally.
When you filter vertically, you filter only specific columns, whereas
with horizontal filtering, you filter only specific rows. In addition,
SQL Server 2008 provides the added functionality of join filters and
dynamic filters.
As Figure 3 shows, you might need to replicate only a customer’s CustomerID, TerritoryID, and CustomerType to various subscribing servers around your company. In your company, the other data, such as AccountNumber,
may be restricted information that should not be replicated for general
use. For that reason, you simply create an article for data replication
that contains a subset of the Customer table that will be replicated to these other locations and excludes AccountNumber (and rowguid and ModifiedDate as well).
As another example, you might need to publish only the Customer table data for a specific customer type, such as “individual” customers ((CustomerType = 'I') or customers that are “stores” (CustomerType = 'S'). This process, as shown in Figure 4, is known as horizontal filtering.
It is possible to combine horizontal and vertical filtering, as shown in Figure 5. This way, you can weed out unneeded columns and
rows that aren’t required for replication (that is, are not needed by
the subscribers). For example, you might need only the customers that
are stores and need only CustomerID, TerritoryID, and CustomerType data to be published.
As
mentioned earlier, it is now possible to use join filters. Join filters
enable you to use the values of one article (that is, values from a
table) to determine what gets replicated from another article (that is,
what values can be associated with another table) via a join. In other
words, if you are publishing the Customer
table data based on the customers that are stores, you can extend
filtering (that is, a join filter) to replicate only those orders for
these types of customers (as shown in Figure 6).
This way, you replicate only orders for customers that are stores to a
subscriber that needs to see only this filtered data. This type of
replication can be efficient if it is done well.
You also can publish
stored procedure executions, along with their parameters, as articles.
This can be either a standard procedure execution article or a
serializable procedure execution article. The difference is that the
latter is executed as a serializable transaction; the serializable
option is recommended because it replicates the procedure execution only
if the procedure is executed within the context of a serializable
transaction. If that same stored procedure is executed from outside a
serializable transaction, changes to data in published tables are
replicated as a series of DML statements. In general, replicating stored
procedure executions gives you a major reduction in the number of SQL
statements being replicated across the network versus standard DML
statements.
For instance, if you wanted to update the Customer table for every customer via an UPDATE SQL statement, the resulting Customer table updates would be replicated as a large multi-step transaction involving at least 5,000 separate UPDATE
statements at a minimum. This number of statements would significantly
bog down your network. However, with stored procedure execution
articles, only the execution of the stored procedure is replicated to
the subscription server, and the stored procedure—not the numerous
update statements—is executed on that subscription server. Figure 7
illustrates the difference in execution described earlier. Some
subtleties when utilizing this type of data replication processing can’t
be overlooked, such as making sure the published stored procedure
behaves the same on the subscribing server side.
Many more data replication
terms are presented in this article, but it is essential that you first
learn about the different types of replication scenarios that can be
built and the reasons any of them would be desired over the others. It
is also worth noting that Microsoft SQL Server 2008 supports replication
to and from many different “heterogeneous” data sources. In other
words, OLE DB and ODBC data sources can subscribe to SQL Server
publications, and they can receive data replicated from a number of data
sources, including Microsoft Exchange, Microsoft Access, Oracle, and
DB2.